A Word Analysis System for German Hyphenation, Full Text Search, and Spell Checking, with Regard to the Latest Reform of German Orthography

نویسنده

  • Gabriele Kodydek
چکیده

In text processing systems German words require special treatment because of the possibility to form compound words as a combination of existing words. To this end, a universal word analysis system will be introduced which allows an analysis of all words in German texts according to their atomic components. A recursive decomposition algorithm, following the rules for word flexion, derivation, and compound generation in the German language, splits words into their smallest relevant parts (= atoms), which are stored in an atom table. The system is based on the foundations described in this article, and is being used for reliable, sense-conveying hyphenation, as well as for sense-conveying full text search, and in limited form also as a spelling checker.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Language-Sensitive Text Editor for Dutch

Modern word processors begin to offer a range of facilities for spelling, grammar and style checking in English. For the Dutch language hardly anything is available as yet. Many commercial word processing packages do include a hyphenation routine and a lexicon-based spelling checker but the practical usefulness of these tools is limited due to certain properties of Dutch orthography, as we will...

متن کامل

Testing a Word Analysis System for Reliable and Sense-Conveying Hyphenation and Other Applications

In this article, we present a test environment for a word analysis system that is used for reliable and sense-conveying hyphenation of German words. A crucial task is the hyphenation of compound words, a huge set of those can readily be formed from existing words. Due to this fact, testing and checking all existing words for correct hyphenation is infeasible. Therefore we have developed special...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Si3Trenn and Si3Silb: Using the SiSiSi Word Analysis System Pre-hyphenation and Syllable Counting in German Documents

We present two applications of a word analysis system for the German language: pre-hyphenation of documents in various formats, and counting the syllables of all words of a document. The Si3Trenn preprocessor provides pre-hyphenation for file formats allowing for soft hyphens (currently: plain text, LTEX, RTF). It applies reliable, senseconveying hyphenation (SiSiSi) to each word of the input t...

متن کامل

Investigating lexical competition - An Empirical Case Study of the German Spelling Reform of 1996/2004/2006

The German spelling reform of 1996/2004/2006 triggered the introduction of new or thographic variants in the German spelling system. These were the products of dif ferent kinds of modi cations enacted by the reform. They could be a result of a `mu tation'-like change of some of the characters of a word (as, for example, the change from Biographie to Biogra e), due to a writing as two words o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000